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Abstract: Latent variable models have been widely applied in different 
fields of research in which the constructs of interest are not directly 
observable, so that one or more latent variables are required to re- 
duce the complexity of the data. In these cases, problems related to 
the integration of the likelihood function of the model arise since an- 
alytical solutions do not exist. In the recent literature, a numerical 
technique that has been extensively applied to estimate latent vari- 
able models is the adaptive Gauss-Hermite quadrature. It provides a 
good approximation of the function to be integrated, and it is more 
feasible than classical numerical techniques in presence of many latent 
variables and/or random effects. In this paper, we formally investi- 
gate the properties of maximum likelihood estimators based on adap- 
tive quadratures used to perform inference in generalized linear latent 
variable models. 

Key words and phrases: Gaussian quadrature, Generalized linear mod- 
els, Laplace approximation, M-estimators. 

1. Introduction 

Models based on latent variables are used in many scientific fields, particu- 
larly in social sciences. For instance, in psychology, researchers often use concepts 
as intelligence and anxiety, that are difficult to observe directly, but that can be 
indirectly measured by surrogate data based on individual responses to a bat- 
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tery of tests. In economics, welfare and poverty cannot be measured directly; 
hence income, expenditure and various other indicators on households are used 
as substitutes. Factor analysis is probably the most well-known latent variable 
model, based on the assumption of multivariate normality for the distribution of 
the manifest and latent variables. It has been extended by numerous researchers 
in order to deal with survey data that generally contain variables measured on 
binary, categorical or metric scales, or combinations of the above. Moustaki and| 



Knott (2000) proposed a Generalized Linear Latent Variable Model (GLLVM) 
framework that allows the distribution of the manifest variables to belong to the 
exponential family, i.e. either continuous or discrete variables. 

The purpose of GLLVM is to describe the relationship between a set of 
responses or items y\, ■ ■ ■ , y p , and a set of latent variables or factors Z\, ■ ■ • , z q , 
that are fewer in number than the observed variables, but contain essentially 
the same information. The factors are supposed to account for the dependencies 
among the response variables in the sense that if the factors are held fixed, 
then the observed variables are independent. This is known as the assumption 
of conditional or local independence. The conditional distribution of ydz (z = 
[zi, ■ ■ ■ , z q ] T ) is taken from the exponential family (with canonical link functions) 



9j(Vj\ z ) = exp 



yj{a 0j + ajz) - bj(a 0j + ajz) 



+ c j (y j ,(f> j )\ j 



■ P-. 



where aoj is the item-specific intercept, ay = [ayi, ctj q ] T can be interpreted as 
factor loadings of the model, and 4>j is the scale parameter, that is of interest in 
the case of continuous observed components. The functions fej(-) and Cj(-, •) are 
known and assume different forms according to the different nature of yj. 
Under the assumption of conditional independence, the joint marginal distribu- 
tion of the manifest variables is 



with y = [y lr 



• y P \ 



e 



g(y\z)h(z)dz 



[aoi, • • • , «0p, £*! , • 



3=1 



h(z)dz 



XI) 



, ex . 



] T , and where z 



is generally assumed to be multivariate standard normal, but the independence 



ADAPTIVE ML ESTIMATORS IN GLLVM 



3 



assumption of the latent variables could be relaxed. 

GLLVMs are designed as a flexible modelling approach. As a consequence, 
they are rather complex models, and their statistical analysis presents some dif- 
ficulties, due to the fact that the latent variables are not observed. Maximum 
likelihood estimates in the GLLVM framework are typically obtained by using 
standard maximization algorithms, such as the EM and the Newton-Raphson 



algorithms (Moustaki and Knott, 2000 Huber et al. 2004). In both cases, the 
latent variables must be integrated out from the likelihood function, and numer- 



ical techniques have to be applied. Moustaki and Knott ( 2000 ) proposed the use 
of the Gauss-Hermite (GH) quadrature as a numerical approximation method. 
Although this is feasible in fairly simple models and tends to work well with mod- 
erate sample sizes, its application is often unfeasible when the number of latent 
variables increases. Moreover, GH can completely miss the maximum for certain 
functions and can be inefficient in other cases. To overcome these limitations, the 
Adaptive Gauss-Hemite (AGH) quadrature has become very popular in the la- 
tent variable literature. It allows to get a better approximation of the function to 
be integrated by adjusting the quadrature locations with specific features of the 
posterior density of the latent variables given the observations. Developed in the 



Bayesian context by Naylor and Smith (1982), it has been extended by several 



authors to deal with generalized linear mixed models. In particular, Schilling and 



Bock ( 2005 ) applied the AGH quadrature to approximate marginal likelihoods in 



IRT models with binary data, whereas Rabe-Hesketh et al. (2005) analyzed its 



behavior for generalized linear latent and mixed models. Furthermore, Joe (2008) 
compared the AGH with the Laplace approximation for a variety of discrete re- 
sponse mixed models. He found that the Laplace approximation becomes less 
adequate as the degree of discreteness increases and suggests to use AGH with 
binary and ordinal data. The prominent role of the AGH is also demonstrated by 
the fact that it is implemented in many statistical software used to fit GLLVM, 



such as in the function gllamm in STATA (Rabe-Hesketh and Skrondal, 2012), 



in MPLUS (Muthen and Muthen 2010), and in the PROC NLMIXED in SAS 



(Lesaffre and Spiessens 2001). However, to the best of our knowledge, inferen- 
tial issues on the properties of the estimators based on the adaptive quadrature 
have not been addressed in the literature. In this paper, we formally investigate 
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these theoretical properties as function of both the sample size and the number 



of observed variables. Our results generalize those by Huber et al. (2004), who 
analyzed the properties of classical Laplace based estimators in GLLVM. Indeed, 
we show that the adaptive Gauss-Hermite quadratures share the same error rate 
of the higher (than one) order Laplace approximation. 

The paper is organized as follows. In Section 2, we discuss the estimation of 
GLLVMs when the adaptive Gauss-Hermite quadrature is applied to approxi- 
mate integrals. In Section 3, the relationship between AGH quadratures and the 
Laplace approximation is analyzed, and the asymptotic properties of the adap- 
tive ML estimators are derived. A simulation study is implemented in Section 4 
to analyze the finite sample properties of the estimators. Finally, in Section 5, a 
brief summary on the main findings of the paper is provided. 



2. Estimation based on adaptive Gauss-Hermite quadrature 

Maximum Likelihood (ML) estimates in the GLLVM framework are typically 
obtained by using either the EM or the Newton-Raphson algorithms. The key 
component for applying both the algorithms is the score vector of the observed 
data log-likelihood function. For a random sample of size n, the latter is defined 

as 



£(9) = J>g/(yi,0) 



y^iog / rT ex p 

l=t J ^jA 

(2vr)- 9 / 2 exp ' 



Vjliaoj + ajzi) - bj(a 0j + ajz t ) 



T 
z l z l 



dzi. 



(2.1) 



It is easily shown that the score vector corresponding to expression (2.1) equals 



S(9) 



dl(0) 
d6 
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g/(^jC^ log{5(ydz ' )/l(Zi)}dZi 

Jm s ii 9 ^ z i)9{yi I zi)h(zi)dzi ^ 
f-T / R , 5(y; I zi)/i(z ; )dz z 



n ,. n 

= S l (0;z l )h(z l \y l )dz l = Y J E zly lS l (0,z l )} = E zly [S(0)} 

1=1 J Rq i=i 

where Si(6; z/) denotes the complete-data score vector given by d log f(yi, 6)/d6 = 
d[\ogg(yi\zi) + log h(zi)] /d6. In words, the observed data score vector is ex- 
pressed as the expected value of the complete-data vector with respect to the 
posterior distribution of the latent variables given the observations. This implies 
that eq. (2.2) plays a double role. If the score equations are solved with respect 
to 9, with h(zi\yi) fixed at the 0-value of the previous iteration, then this cor- 
responds to the EM algorithm, whereas, if the score equations are solved with 
respect to 6 considering h(zi\yi) also as a function of 6, then this corresponds to 
a direct maximization of the observed data log-likelihood £(0). As we shall dis- 
cuss further, based on this appealing feature, the estimators derived by applying 
either of these two algorithms will share the same theoretical properties. 



Eq. ( 2.2 ) involves ratios of multidimensional integrals which cannot be solved 
analytically, except when all the gj{yji\zi) are normal. Consequently, an approx- 
imation of these integrals is needed, on which the bias and variance of resulting 
estimators will depend. In this paper, we study the properties of ML estimators 
based on the adaptive Gauss-Hermite approximation of integrals. This technique 
consists of adjusting the quadrature locations with specific features of the pos- 
terior density of the latent variables given the observations. This allows to get a 



better approximation of the function to be integrated. Naylor and Smith ( 1982 ) 
took the mean vector and covariance matrix of the normal density approximating 
the integrand to be the posterior mean and covariance matrix. Unfortunately, 
these posterior moments are not known exactly, but must themselves be obtained 
using adaptive quadratures. Integration is therefore iterative. To overcome this 



limitation, Liu and Pierce (1994) proposed an alternative procedure that consists 
in computing the mode of the integrand and its curvature (inverse of the Hessian 
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matrix) at the mode, so that numerical integration is avoided. In this case, the 
adaptive quadrature, when applied using one abscissa, is equivalent to the classi- 
cal Laplace approximation, and its behavior has been analyzed in several papers 



on generalized linear models (Pinhero and Bates, 1995; Schilling and Bock 2005 



Skrondal and Rabe-Hesketh, 2004 Joe, 2008). 



The application of the adaptive quadrature requires to rewrite eq. ( 1.1 ) as follows 



where h\{-\ z,i, S&i) is a multivariate normal density with moments 



zi = arg max [logflr(y|zj) + log/i(z /y 



d 2 [logff(y|z;) + \ogh(zi)] 
dzjdzi 



(2.3) 



(2.4) 



(2.5) 



Zi=z ; 



A cartesian product rule based on the classical Gauss-Hermite quadrature is 
then applied so that the integrals have to be defined with respect to uncorrelated 
variables z\. Based on the Cholesky factorization of the covariance matrix = 



TiTf, expression (2.3) can be rewritten as 



/(yj,0) = 2§|T Z | f 5 (yi|v / 2T i z / + z i )/ i (\/2Tjz / + z / )exp [zfzj] exp [-zfz,] dz u 

such that the AGH approximation of the density f(yi,0),l = 1, • • • , n, is given 
by 

(2.6) 



where ... t = Ylt 1= i ' ' ' Ylt =1j being k the number of quadrature points 



selected for each latent variable, z* ti ... t 



\ Z l,t! ) " " " ) Z lj Q 



\^2Ti(z tl , • • • , z tq ) T + z/ and wZ = wt k exp[,z|J are the AGH nodes and weights, 
respectively, with zt k being the classical GH nodes and u>t k ,k = l,...,q, the 
corresponding weights. 



From eq. (2.6), we obtain the approximated log-likelihood function 
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1(6) = ^log 



„ ( Vji ( a oj + "Jz;,*!,... ,tj - h (aoj + «j % tll - ,tj 

23 i t <! e n ex p — ^ — + ^) 



(2tt) 9/: "exp I -~z|£,... ,t„ z Mi, 



The estimators of the model parameters are found by equating the corresponding 

zero, that is 

i df( yi ,e) 



derivatives of eq. (2.7) to zero, that is 
dl{9) 



S{6) 



de 



E 



J/(yi,«) fle 



E 



•to* 



Et!,... »(yil< tll ... ,t>( z ti,- • • • 

^4|y[W^,)]=4[y W)]=o (2i 



where, specifically, 



Si{aoj,z*i, 



Vji 



dbj [a 0j + ajz* lti ... tq 



da, 



Si( 



"7,tl ,t q 



and 



(2.7) 



l r 



Vji [a j + ajz*,*!,...,*,) - bj (a 0j - a] z*^... ^ 



+ 



dcj(yji,(f>j) 



Equations (2.8) provide a set of estimating equations defining the estimators 



for the model parameters. The same equations are derived in the E-step of the 
EM algorithm, in which the AGH quadrature is applied to approximate the E- 



step expectations (2.2). In the M-step, as in the direct maximization algorithm, 



improved estimates for the model parameters are obtained by maximizing the 
approximated expected score functions (2.8). For the scale parameter <j)j, closed 



form expressions can be derived, whereas, for the other parameters, a Newton 
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Raphson iterative scheme is used in order to solve the corresponding nonlinear 
maximum likelihood equations. In the derivation of the estimating equations, the 
model has been kept as general as possible without specifying the conditional dis- 
tributions gj(yj\z). In Appendix C, we give specific expressions for the quantity 
that are used in the likelihood function (2.1) and in the score functions (2.8) for 



binary manifest variables, whereas we refer to Bianconcini and Cagnone (2012) 
for count and categorical observed variables. 



3. Statistical properties of the AGH-based estimators 



To investigate the asymptotic properties of the maximum likelihood estima- 
tors based on the adaptive Gauss-Hermite quadrature, the error rate associated 



to the approximation (2.8) has to be determined. Liu and Pierce (1994) analyzed 



the asymptotic behavior of the AGH when it is used to approximate unidimen- 
sional integrals. Based on the fact that when applied with only one node it 



results in the Laplace approximation to the integral (De Bruijn, 1981 Barndorff- 



Nielsen and Cox, 1989), they proved that the adaptive quadrature based on k 



points can be alternatively thought as a higher (than one) order Laplace approx- 



imation. We now generalize this result to the multidimensional integral (1.1) as 



well as to the ratio of integrals (2.2), and we analyze the asymptotic accuracy of 



the corresponding Laplace approximations. The behavior of the latter for mul- 



tidimensional integrals was studied by Barndorff-Nielsen and Cox (1989), Shun 



and McCullagh (1995), Shun (1997), and recently by Evangelou et al. (2011) 



for spatial generalized linear mixed models. Similarly, Raudenbush et al. (2000) 



considered improvements of the standard Laplace approximation obtained by in- 
corporating higher order derivatives of the integrand. 



For the derivations illustrated here, we follow the notation of Shun and McCul- 



lagh (1995) based on summation convention. Hence, an index that appears as 



a subscript and as a superscript implies a summation over all possible values of 
that index. We will denote the components of a vector sometimes by subscripts 
and sometimes by superscripts. The (i,j)th component of a matrix A will be 
written as aij and its inverse (when exists) will have components a 13 . For any 
real function /(z),z £ M 9 , its derivative with respect to the ith component of z 
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is denoted by a subscript, i.e. fi(z) = , fij(z) = q z q z . , and, more generally, 
/n,- ,i2m ( z ) = w~^z^~- ^ n or der to keep the notation as light as possible, we 

H *2m 

omit the individual subscript I. 



3.1. Relationship with the Laplace approximation 



The AGH quadrature implemented here is based on a tensor product of q 
univariate Gaussian quadratures based on the same number of quadrature points. 



In each dimension, the approximation (2.6) is exact for polynomials of degree 



2k + 1 or less. Hence, it provides a good approximation of the integral (1.1) if 
the ratio v(z) 



faT0^&s c a n be approximated well by a g-variate polynomial, 



where the maximum exponent of all the monomials is at most 2k + 1 (Tauchen 



and Hussey, 1991), in the region where the integrand is substantial. It follows 



that the effectiveness of the adaptive Gauss-Hermite approximation (2.6) can be 



evaluated by considering the Taylor series expansion of v{z) around the mode z, 
that is | 

1 + Y] — ycii,-,<m( g )( a 
mi 



viz 



v(z) 



m=3 



where (ii, 



t ) is a set of m indices, Ci lt ... ,j ro (z) 



(3.1) 



,(2) 



, v m) id a ocu ui ,/(, muitcD, c ni ... )Jm ^ ; — ^ , ,i m (z) de- 
notes the partial derivatives of order m of f with respect to z% x , ■ ■ ■ , Zi m evaluated 
at the mode z, whereas (z — z) n '"'' im refers to specific components of the vector 
(z — z). The coefficients and Cj Xi j 2 are zero due to the choice of h\{-; z, *S>). 



Substituting the expansion (3.1) into the integral (1.1), we obtain the exact so- 
lution 



EE 



1 



m=2 



(2m)! 



,«2i 



^z)^ 1 ^)---^" 1 ^) 



(3.2) 



where the second sum is over the partition Q = qi \ ■ ■ ■ \ q m of 2m indices into m 
blocks, each of size 2, and v qk (z), k = 1, m, are components of the covariance 
matrix The Gauss-Hermite quadrature, for which k quadrature points are 
selected for each dimension, would be exact if the partial derivatives beyond the 
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2(k + 1) order in (3.2) are zero, that is 



f(y,6) = u(z) 



k 1 
1 + E E Jzmji. c h 



m=2 Q 



■'l>r, 



i/ 9m (z) 



(3.3) 



To determine the asymptotic order of the approximation (3.3), its relationship 



with the higher order Laplace approximation of multidimensional integrals has 



to be taken into account. At this regard, the integral (1.1) has to be rewritten 
as 

f(y,0)= [ e^ L ^dz (3.4) 
Jm. q 

where L(z) = — [log<?(y|z) + log/i(z)], such that L(z) = 0(p). Assuming that 
L(z) has a unique minimum z, Shun and McCullagh ( |1995 ) suggested the fol- 
lowing expansion around that minimum 



oo 1 

L(z) = I(z) + ^-L il ,., im (z)(z-^ 

m=2 

and applying the exponential function 
e - L ^ = (2^)' ? / 2 |*| 1 / 2 e- L(2) /ii(z;z,*)exp 



^^...jzllz-z)"-' 



m=3 



to! 



where h\{-; z, \&) is a multivariate normal density with moments given in eq. (2.4) 



and (2.5). Based on exlog relations, the higher order term can be expressed as 
follows 



exp 



00 (--\) 



m=3 



»1V" ,«», 



m=3 P 



such that the exact solution of the integral (3.4) is given by 



(2vr)2|*|l e - L(2) 



°° (-11* 

1 - E E j^jy^ (*) • • • A* ( 2 ) L91 • • • LQm (*) 



m=2 P,Q 



(3.5) 



where the second sum is over all partitions P, Q, such that P = p\\ ■ ■ ■ \pt is a 
partition of 2m indices into t blocks, each of size 3 or more, and Q = qi \ ■ ■ ■ \ q m 
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is a partition of 2m indices into m blocks, each of size 2. Each component 
L Qk (z), k = 1, • • • , m, refers to specific elements of As shown in the Appendix 



A, the exact solution (3.5) is equivalent to the one derived in eq. (3.2). It 



follows that the asymptotic order of the AGH approximation can be derived by 
truncating at m = k the expansion (3.5), and by analyzing the asymptotic order 
associated to the bipartition (P, Q) related to m = k + 1. For fixed q, the usual 



asymptotic order of the term corresponding to the bipartition (P,Q) in (3.5) is 
0(p t ~ m ). It follows that the asymptotic error of the AGH based on k quadrature 
points is the same associated to the bipartition (P, Q) of 2(k + 1) indices, that is 
O (p^^+A^ (see in Appendix A for more details). 

It has to be noticed that when AGH quadratures are applied in the estimation 



of GLLVM, we need to approximate ratios of integrals as shown in eq. (2.2). The 



fully exponential solution (3.5 ) cannot be applied to the integral at the numerator, 



since the score functions S(0; z) are not necessarily positive. The integral has to 



be written in the standard form (Tierney et al. 1989 Evangelou et al. 2011) 



l e~ L ^S(0;z)dz. 
Jm 



Beyond the Taylor series expansion of L{z) around its minimum z, we have to 
consider a similar expansion of S around the same point, that is 



s(e-,z) = ^s h ,.., jm (e-,z)(z-zy^-^ 



m=0 



Following Evangelou et al. (2011), it can be shown that 



S(0;z)g(y\z)h(z)dz 

oo 2m 



(27r)2 |*| 2e 



EE £ 



P,Q 



m=0 s=0 



(2m)\ 



S jl ,.. Js (0-,z)L Pl (z)...L pt (z)L q HZ)---L q " 



where P is a partition of 2m — s indices into t blocks, each of size 3 or more, and 
Q is a partition of the same indices together with ■ ■ ■ ,j s } into m blocks of 
size 2. Note that P and Q do not need to be connected. It follows that the exact 



Laplace solution of the expected score function (2.2) results 
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Z™=o Eg) Ep,q &S n ,.. j. (0; z)L Pl (z) • • • L Pf (S)£« (z) • • • L*» (z) 
E™=o Ep,q (2) • • • L pt (z)L^ (z) • • • L*» (z) 

It will be perfectly account for the AGH approximation in eq. ( |2.8[ ) if the partial 
derivatives, at both the numerator and denominator, of order greater than 2k 
(maxm = k) are zero. The corresponding Laplace approximation can be rewrit- 
ten by regrouping in decreasing asymptotic order the scalar that appear in both 
the expansions, and by truncating the resulting series at an appropriate point. 
In symbols, 



5(0; z) + c\p~ x + ■■■ + c* r p~ r + ■■■ + c* [k/3]P ^ k W + O (V [t +1 J 



1 + Cip- 1 + ■■■ + C r p~ r + ■■■ + C [k/3] p-^ k /^ +0[p 

where the coefficients c r , r = 1, [|] , are given by 



[l+i] 



(3.7) 



3r 



(— 1 Y 

°r= E { -v^L Pl (z)...L Pm _ r (z)L^(z)...L^(z) 

m=r+l 

with pi | • • • \pt be a partition of 2m indices into m — r blocks, each of size 3 or 
more, and qi\ ■ ■ ■ \q m is a partition of 2m indices into m blocks, each of size 2. 
On the other hand, the coefficients c*, r = 1, [|] , results 

C * = E E ^^^,^.(fl;£)i^(fi)---l^ 1 .(«)L«(8)...L*-(.) 

where pi | • • • \p m -r is a partition of 2m — s indices into m — r blocks, each of 
size 3 or more, and q\\ ■ ■ ■ \q m is a partition of the same indices together with 
Oil"" )Js} hrio m blocks of size 2. Since Sj lt ... j g (6;z) = aLn '' d Q a ^ , all the 
first derivatives of the score function will be zero due to the choice of z. 
Based on long polynomial division, the approximated expected score functions 



equivalent to (2.8) are given by 



n 

E z \ y [S(0)} = [Si(0, k) + cTp- 1 + ■■■ + c* r *p- r + • • • + O (V (3. 
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C** = [c* — 5(0; z)c r ] — C r -ic{ — C r -2C2 



c±c r _i 



being 



3r 3r— m 



C ;-S(0;z)c T =Y J E 



(-1)' 



m=r s=2 



(2m)! 



- S n ,.. , js (0; z)L Pl (z) • • • L Pm _ r (z)L^ (z) • • • L*» (z) 



and, in particular, c| — 5(0; z)ci = \Sj 1 j 2 lJ 1 ^ 2 {i). 



3.2. Asymptotic behavior of the AGH-based estimators 

To investigate the properties of the AGH approximated maximum likelihood 
estimators 9, we analyze the asymptotic behavior of the corresponding Laplace- 



based estimators defined by (3.8). Our arguments are similar to those of Huber 



et al. (2004), who discussed classical Laplace estimators in GLLVM, and Ri 



zopoulos et al. ( 2009 ) who analyzed the consistency of fully exponential Laplace 



estimators in joint models for survival and longitudinal data. At this regard, let 
Qq denote the true parameter value, then, under suitable regularity conditions, 
it is shown in the Appendix B that 



(0 - O ) = O p 



max ! ii 1 ^ 2 ,p 



f+1 



(3.9) 



Thus, is consistent as long as both n and p grow to oo. The ra -1 / 2 term comes 
from the standard asymptotic theory, whereas the term derives from the 

AGH approximation. For k > 3, the AGH-based estimator is more accurate 
than the classical 0(p^ 1 ) Laplace-based estimators. Indeed, it shares the same 
accuracy of higher (than one) order Laplace estimators, but, with respect to this 
latter, the adaptive Gauss-Hermite is easier to be implemented, since it avoids 
derivative computations. 



Based on the derivation of eq. (3.9) as presented in the Appendix B, we can 
deduce that, if p= 0(n p ) for p > pr~~j7i then the AGH-based estimators will be 
asymptotically equivalent to the true maximum likelihood estimators that solve 
5(0) = 0. However, in general, they are not maximum likelihood estimators 
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because of the approximation, but, as discussed by Huber et al. (2004) for classical 



Laplace estimators in GLLVM, they belong to the class of M-estimators . The 
latter are implicitly defined through a general ^-function as the solution in 6 of 



^( yi ;6) = 0. 



1=1 



The ^-function for the AGH-based estimators are given by eq. (2.8). Then, 



under the conditions that are given by Huber ( 1981 ), the AGH-based estimators 
are asymptotically normal, i.e. 

n{6 - G ) ^ D MVN(0, B(0 Q )- 1 A(0 )[B(0 )- 1 ] T ) 



as n — > oo, where 



A(0 ) = £[*(y,;0 o )* T (yz;0o)], 

"0tf(yi;0o)' 



B(G ) 



-E 



86 



These conditions must be checked for the particular conditional distribution of 



each yj (Huber et al. 2004). 



4. Monte Carlo simulations 

In this section, we investigate empirically the finite sample performance of 
the adaptive Gauss-Hermite and related Laplace-based estimators. We focus on 
latent variable models for binary data, since in this case the differences between 



numerical techniques should be better highlighted ( Joe 2008 ) . We consider two 
simulation scenarios characterized by an increasing number of observed and latent 
variables. In particular, we generate data from a population that consists of 
six items satisfying a three factor model, and from a population based on ten 
observed variables that satisfy a five factor model. In both cases, the population 
parameters have been chosen in such a way that the item-specific intercepts and 
the factor loadings are drawn randomly from a log-normal distribution, with 
some loadings fixed to to get unique solutions. For each scenario, 100 random 
samples have been considered with 200 subjects. 
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A crucial choice in the application of the AGH quadrature is the number of 
points needed to adequately approximate the likelihood function. In presence of 



binary data, Schilling and Bock ( 2005 ) suggested to select five quadrature points 
in presence of three latent factors, and three nodes for the five factor model. In 
both cases, the AGH shares the same asymptotic error of the Laplace approxima- 
tion of order 0(p~ 2 ). For a different number of binary items (ranging from 8 to 
15) observed over an increasing number of individuals (ranging from 50 to 1000), 
this relationship is illustrated in Figure (4.1). The latter shows the differences in 
the log-likelihood functions of the two- (left) and three- (right) factor models that 
are approximated using the AGH based on one (first row), three (second row), 
and five (third row) quadrature points, compared with the standard Laplace 
and the second order Laplace approximation. As expected, independently on 
the number of individuals, observed items and latent variables, when applied 
with just one quadrature node the AGH provides the same approximation as the 
standard Laplace technique. On the other hand, in this case, the second order 
Laplace approximation differs from the adaptive quadrature with higher differ- 
ences for the model with three latent factors. The discrepancy between these two 
techniques drastically reduces as a larger number of nodes is used in the adaptive 
quadrature, independently on the number of factors, observed items and individ- 
uals. In the simulation study, we apply the adaptive quadrature using five and 
three quadrature points in the first and second scenario, respectively. Hence, 
its performance is compared with the second order Laplace approximation. The 
estimation is performed through the direct maximization algorithm described in 
Section 2, whose mathematical details for the case of binary observed items are 
provided in Appendix C. The algorithm is written in the statistical language R 
(R Development Core Team, 2010) and the program is available from the authors 
on request. 



Table 4.1 illustrates the mean, bias, and Mean Square Error (MSE) of the param- 
eter estimates obtained by applying both the techniques to the data generated 
by the three factor model. The results show that the AGH based on five quadra- 
ture points and the second order Laplace provide similar MSE values for all the 
model parameters. The Laplace seems to introduce a slightly larger bias in some 
estimates than the AGH, but a smaller variability with respect to the latter. On 
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the other hand, Table |4.2| shows the results for the five-factor models. In this 
specific case, the adaptive Gauss- Hermite has been applied with three quadra- 
ture points. We can notice that the results of the latter are very close to the one 
derived using the second order Laplace approximation, with similar conclusions 
to those drawn for the first scenario. 

Table 4.1: Mean, bias and MSE of the parameter estimates for AGH based on 5 quadra- 
ture points and second order Laplace (Lap2) approximation in data generated by the 
three factor model. 









AGH 






Lap2 




True 




Mean 


Bias 


MSE 


Mean 


Bias 


MSE 


an = 


1.01 


0.72 


-0.29 


0.22 


1.32 


0.31 


0.59 


<*2i = 


0.91 


1.17 


0.26 


0.20 


1.11 


0.20 


0.18 


C*31 = 


0.50 


0.39 


-0.12 


0.11 


0.35 


-0.16 


0.14 


c*4i = 


0.74 


0.99 


0.25 


0.20 


1.14 


0.40 


0.19 


C*51 = 


1.16 


1.39 


0.23 


0.19 


1.68 


0.53 


0.41 


C*61 = 


1.22 


1.54 


0.33 


0.30 


1.23 


0.02 


0.27 


«12 = 


0.00 














»22 = 


0.83 


0.45 


-0.38 


0.24 


0.21 


-0.63 


0.86 


C*32 = 


0.44 


1.02 


0.58 


0.48 


1.06 


0.61 


0.51 


C*42 = 


0.88 


1.15 


0.27 


0.25 


1.13 


0.25 


0.29 


as2 = 


1.73 


2.54 


0.80 


0.93 


2.53 


0.79 


0.77 


a§2 = 


1.46 


1.43 


-0.03 


0.27 


1.36 


-0.10 


0.54 


C*13 = 


0.00 














C*23 = 


0.00 














C*33 = 


1.45 


1.08 


-0.37 


0.33 


1.21 


-0.23 


0.52 


C*43 = 


1.05 


1.52 


0.48 


0.41 


1.49 


0.44 


0.33 


C*53 = 


0.62 


0.93 


0.31 


0.23 


0.80 


0.18 


0.17 


C*63 = 


0.91 


0.98 


0.08 


0.19 


0.53 


-0.38 


0.26 



5. Conclusions 

In this paper, we have investigated the theoretical properties of adaptive 
Gauss-Hermite based estimators in the GLLVM framework. Recently, the adap- 
tive quadrature has played a prominent role in the latent variable model lit- 
erature for approximating integrals defined over the latent space. It allows to 
overcome the main limitations of the commonly used techniques, such as the 
Gauss-Hermite quadrature and the standard Laplace approximation. Indeed, 
AGH is applicable to problems involving high-dimensional integrals where the 
former becomes impractical or computationally intensive, and it provides more 
accurate estimates than the latter, particularly when used for binary or ordinal 



ADAPTIVE ML ESTIMATORS IN GLLVM 



17 



Figure 4.1: Differences in the log-likelihood functions approximated using AGH based 
on one/three/five nodes for each dimension, second order Laplace (blue) and standard 
Laplace (pink) approximations for two (left) and three (right) factor models. 

One quadrature point for dimension 




Three quadrature points for dimension 




Five quadrature points for dimension 



18 



SILVIA BIANCONCINI 



Table 4.2: Mean, bias and MSE of the parameter estimates for AGH based on 5 quadra- 
ture points, and second order Laplace (Lap2) approximation in the data generated by 
the five factor model. 









AGH 






Lap 2 




True 




Mean 


Bias 


MSE 


Mean 


Bias 


MSE 


on = 


1.01 


0.70 


-0.31 


0.21 


1.28 


0.27 


0.28 


"21 = 


0.91 


1.27 


0.36 


0.20 


1.33 


0.42 


0.44 


"31 = 


0.50 


0.87 


0.37 


0.30 


0.57 


0.06 


0.05 


Q 4 1 = 


0.74 


1.14 


0.40 


0.31 


0.85 


0.11 


0.12 


"51 = 


1.16 


1.83 


0.67 


0.52 


1.98 


0.82 


0.90 


"61 = 


1.22 


1.22 


0.00 


0.15 


0.66 


-0.55 


0.36 


an = 


0.55 


0.48 


-0.07 


0.10 


0.59 


0.04 


0.05 


"81 = 


0.83 


1.10 


0.27 


0.19 


0.90 


0.07 


0.07 


"91 = 


0.44 


1.01 


0.57 


0.40 


1.05 


0.61 


0.42 


"101 = 


= 0.88 


1.05 


0.17 


0.12 


1.17 


0.29 


0.11 


"12 = 


0.00 


- 


- 


- 


- 


- 


- 


"22 = 


1.46 


1.39 


-0.07 


0.07 


1.21 


-0.25 


0.25 


"32 = 


0.89 


0.59 


-0.30 


0.34 


0.80 


-0.08 


0.15 


Q42 = 


1.64 


1.27 


-0.37 


0.19 


1.37 


-0.27 


0.22 


"52 = 


1.45 


0.60 


-0.85 


0.84 


0.59 


-0.86 


0.96 


"62 = 


1.05 


0.92 


-0.12 


0.15 


1.06 


0.02 


0.04 


"72 = 


0.62 


0.68 


0.06 


0.18 


0.80 


0.18 


0.15 


"82 = 


0.91 


0.40 


-0.50 


0.36 


0.19 


-0.72 


0.69 


Qg 2 = 


1.59 


1.22 


-0.36 


0.23 


2.02 


0.43 


0.54 


01102 = 


= 1.27 


0.95 


-0.32 


0.21 


1.22 


-0.05 


0.07 


"13 = 


0.00 


- 


- 


- 


- 


- 


- 


"23 = 


0.00 


- 


- 


- 


- 


- 


- 


"33 = 


0.71 


1.10 


0.38 


0.35 


1.13 


0.42 


0.43 


"43 = 


0.35 


1.02 


0.67 


0.54 


1.01 


0.66 


0.46 


"53 = 


0.53 


1.46 


0.93 


0.94 


1.46 


0.93 


0.90 


"63 = 


0.83 


0.97 


0.15 


0.18 


0.64 


-0.19 


0.10 


"73 = 


0.71 


1.12 


0.41 


0.30 


1.52 


0.81 


0.80 


"83 = 


0.65 


1.36 


0.72 


0.60 


1.27 


0.63 


0.62 


"93 = 


0.95 


1.19 


0.24 


0.15 


0.84 


-0.11 


0.04 


"103 = 


= 0.88 


1.23 


0.35 


0.27 


0.68 


-0.20 


0.07 


"14 = 


0.00 


- 


- 


- 


- 


- 


- 


Q24 = 


0.00 


- 


- 


- 


- 


- 


- 


"34 = 


0.00 














"44 = 


1.10 


1.42 


0.32 


0.24 


1.95 


0.86 


0.90 


Q54 = 


0.50 


0.84 


0.34 


0.20 


0.95 


0.45 


0.53 


Q 6 4 = 


0.49 


0.93 


0.45 


0.37 


0.05 


-0.44 


0.31 


074 = 


1.20 


0.67 


-0.52 


0.54 


0.50 


-0.70 


0.54 


"84 = 


0.41 


0.43 


0.02 


0.13 


0.11 


-0.30 


0.19 


Q94 = 


0.85 


0.82 


-0.03 


0.14 


0.43 


-0.42 


0.22 


"104 = 


= 0.72 


1.03 


0.31 


0.23 


1.11 


0.38 


0.19 


"15 = 


0.00 














"25 = 


0.00 














"35 = 


0.00 














045 = 


0.00 














"55 = 


0.62 


0.80 


0.19 


0.11 


0.53 


-0.09 


0.13 


"65 = 


0.99 


1.32 


0.34 


0.22 


1.23 


0.24 


0.45 


"75 = 


1.12 


1.04 


-0.08 


0.14 


0.97 


-0.16 


0.15 


"85 = 


0.86 


1.05 


0.19 


0.20 


1.42 


0.56 


0.59 


"95 = 


0.71 


0.67 


-0.04 


0.11 


0.94 


0.23 


0.10 


"105 = 


= 1.39 


1.32 


-0.07 


0.13 


1.73 


0.34 


0.41 
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data with small-sample sizes ( Joe , 2008 ) 



We have proved that, for multidimensional integrals, the AGH solution is 
asymptotically equivalent to the Laplace approximation that involves specific 
higher (than two) order derivatives of the integrand. Higher order Laplace ap- 
proximations have been suggested in several papers on generalized linear models 



Raudenbush et al. 


2000 


Evangelou et al. 


2011 


Bianconcini and Cagnone 


2012 



as an alternative to classical methods for improving the accuracy of the estimates. 
This extension has been motivated by the well-known asymptotic properties that 
characterize the Laplace method, and by the fact that the approach does not suf- 
fer from the "curse of dimensionality". However, the inclusion of higher order 
terms is computationally demanding as the order of the approximation increases. 
On the other hand, the AGH quadrature is easier to be implemented, but of 
course its computational complexity increases as the number of latent variables 
increases. Hence, AGH and higher order Laplace approximations can be seen as 
complementary approaches that share the same asymptotic properties. 

We have shown that the AGH-based estimators are consistent as the sample 
size and number of observed variables grow to infinity. The convergence rate 
of these estimators depends also on the number of quadrature points used for 
each dimension. In general, these estimators are less efficient than maximum 
likelihood estimators because of the approximation, but belong to the class of 
M-estimators, for which the asymptotic properties are well-known such that 
correct inference can be performed. 
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Appendix A: Asymptotic behavior of the multivariate AGH approxi- 
mation 



The higher order Laplace approximation of eq. (1.1) is derived by considering 



where L(z) = — [log<?(y|z) + logfo(z)], being L(z) = 0(p). It is based on the 
Taylor series expansion of L around its minimum z, that is 



1 ^ i 

L(z) = L(z) + -L n , 2 (z)(z - z)^ 2 + — { W,-,i m m 

m=3 



(1) 



Substituting ([I]) into the integral, we obtain 



exp[-L(z)]dz = (27r)5|*|5e" i(£) / /n(z; z, *) exp 



A ^ 7-7-1 I 



(27r)5|*|5e _ 



1 - E E \A L ^ ^ ■ ■ ■ L « ^ LQ1 ^ ■ ■ ■ L9m & 



where the second sum is over all partitions P, Q, such that P = Pi\ - ■ - \pt is a 
partition of 2m indices into t blocks, each of size 3 or more, and Q = qi \ ■ ■ ■ \q m 
is a partition of 2m indices into m blocks, each of size 2. Each component 
L qk (z), k = 1, • • • , m, refers to specific elements of the covariance matrix 
We want here to show that the exact higher order Laplace solution for the integral 



(1.1) is equivalent to the one based on the AGH quadrature given in eq. (3.2). 



<7z 
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To do so, we need to show that 

oo 

l+^S-l) 1 L Pl (z)...i pt (z)(z-z)' 1 - 



m=3 P 



m=3 



(z)(z 



(2) 

At this regard, we can notice that, based on the exlog relations, the LHS term of 
eq. Q is equal to exp [Ylm=3 ^h,- ,i m (^)( z ~ z) n '"' ,im ] . This higher order term 
can be rewritten as 



7r(z) = exp 
that it is equal to 

(27r)-8|¥|-30(y|z)/i(z 



L(z) + L(z) + -L Ma (z)(z - z 



g(y|z)/i(z)/ii(z;z,*) 



Kz) 



c z 



Hence, the Taylor series expansion of vr(z) around the minimum z can be written 
as 



7r(z) 



oo ^ 

1 + Y —,*iu 
ml 

m=3 
oo ^ 

1 + Y, 

L — / ml 



.(*)(■ 



m=3 



It follows that the AGH solution (3.2) and the Laplace one (3.5) are equivalent. 
Based on this relationship, it is possible to derive the asymptotic error associated 



with the AGH approximation (3.3) evaluating the equivalent Laplace approxima- 



tion obtained by truncating (3.5) at m = k. Shun and McCullagh (1995) proved 
that, for fixed q, the usual asymptotic order of the term corresponding to the 
bipartition (P, Q) is 0(p t ~ m ). The error rate of the AGH based on k quadrature 
points is the same associated to the bipartition (P, Q) of 2(k + 1) indices in the 



expansion (|3.5|). In this case, the maximum number of blocks, each of size at 

"2(fc+l)" 



least 3, for 2(k + 1) indices is ; where [r] indicates the largest integer not 

exceeding r. Hence, being m = k + 1, the AGH based on k quadrature points 
has associated asymptotic order equal to O 
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Appendix B: Consistency of the AGH-based estimators 



This section concerns with the consistency of the AGH based estimators. All the 



following proofs proceed along the lines of Vonesh (1996), who derived the rate 



of convergence of the estimator based on the classical Laplace approximation for 



nonlinear mixed effect models, and of Rizopoulos et al. (2009), who derived that 



rate for fully exponential Laplace based estimators in joint models for longitudinal 
and survival data. In particular, we work under the following assumptions: 

1. f(y,0) is a well-defined density under the usual regularity conditions; 

2. the true parameter value Oq is an interior point of the parameter space, and 
the Laplace estimator 6 is an interior point in a neighborhood containing 
#0 ; 

3. z = argmax2 g iRg[logg(y|z) + l°gM z )] exists for all i = 1, • • ■ , n. 

Let S(-) denote the approximated score vector according to the approximations 



(3.8); then we obtain 



E zly [S(9)] = S(e) 
n- l S(0) 



Tl 

£{^,zO + --- + o(p-[f +1 ])} 



i=i 

n - 1 s{e) + o( P 



(1) 



since is chosen such that E z \ y [S{9)\ = S{0) = 0. Under the regularity condi- 
tions in assumption 1. and provided that {0 — 6q) = o p (l), we can apply a Taylor 
series expansion in S{0) around the true parameter vector 9q: 



S(0) = S(G ) + H(0*)(0 - ), 
where 6* lies on the segment joining 6q and 0, and H(6*) = 



(2) 



v-vra 
2^1=1 



0=0* 



T2=iHi{e\k). 



From equations and ^ we obtain 

(0-0 ) = -in- 1 *l 
I l=i 

(0-0 ) = -In-^HiiO*,* 



n 



S(0 ) - S(0)] } 



-1 



1=1 



n- L S(9 ) + O(p 
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In addition, under assumptions 1. and 2., we have that, as n — > oo, n~ l H(6*) — > p 
E y [H(0())], where the expectation is taken with respect to /(y, 0), and H(0*) = 
Y^i=\ Hi(0*,zi). By further assuming that E y {H(6q)} is non-singular, we obtain 



{n^Hie*)}- 1 ^ EyiHie,)}- 1 



It follows that 

(0 - ) 



E y [H{0 )}- 1 \n- 1 S(9 ) + o(p 



0„ 



max n 



where in the last step we use the fact that, under the regularity conditions 1., 
n^SiOo) = Opin- 1 / 2 ), and E y {H(O )} = O p (l). 



Appendix C: Technical details 

Let y = (yx, ■ ■ ■ , y p ) T be a vector of observed binary variables, having a Bernoulli 
distribution with expectation iTj(z),j = 1, • • • ,p. Using the canonical link func- 
tion for Bernoulli distribution we have 



TTj(z) 



exp I aoj + cX jZ 



1 -I- exp ( a j + ajz 



The scale parameter 4>j = 1, such that the conditional distribution of each ob- 
served binary item given the latent variables z is 



9j{Vj\ z ) = exp 



Vj ( a 0j + otj z) + log 



1 



1 + exp ( a j + exjz 



It follows that the approximate log- likelihood function (2.7) results 



£(0) = £>g 



2 ^I T 'I II exp ( q °j + a J z *.*i,-,*j + lo s 



t±,- ■ • ,t q i — 1 



1 + exp ( a 0j + ajz* t 



(27r) 9/2 Cxp ( --Zj,(,.. j.Z,.,, ...,,,) • • • 
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where the AGH nodes and weights are derived by the classical Gauss-Hermite 
nodes Zt k and weights wt k , k = 1, • • • , q, as follows 



« tl , • • • , < t J T = Vvrfa , • • • , z tq f + z. 



and 



«4 = ex p[^ 2 J 



with T; derived by the Cholesky factorization of the matrix that is \Dv = 
T;T^. The modes z; are obtained for each subject through the iterative scheme 

zf +1 = zf + tff L(zf ) 



where "it" denotes the iteration counter, 
<9[log5(yi|z«) +log/i(z/)] 



and 



azF 



<9 2 [log5(y ; |zi) +log/i(z z )] 



z != z« 



i=i 



exp I a j + a?zf 



1 + exp I aoj + a. j z 



dzfdzi 



T 



exp I aoj + ctjzf 



z;=z 



j=i 1 + exp ( a j + «Jzf 



+1. 



In this specific case, the complete data score functions (2.8) are given by 



1 +exp (a j + ajz* ti 



Vjl 



explaoj +ajz* ti> ... 



1 + exp^aoj + ajz*^...^ 

The corresponding score equations have not closed form solutions, and a quasi- 
Newton procedure is used to solve implicit equations. 

In the simulation study, the performance of the adaptive-based estimators has 



been compared with second order Laplace estimators. According to eq. (3.5), the 
latter have been derived by maximizing the following approximated log-likelihood 
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function 



1(6) = £>g [(2vr)^ 2 |* / 1/2 exp[-L(z / )] (l + dp' 1 + 0(p- 2 ) 
1=1 

where the individual modes z/ are obtained through the iterative scheme defined 
above, and 



ci 



m=2 



■1 



itn- 1 



(2m)! 



-L Pl (8,)-"Wi(^' 1 (8 l )--^* m (2j 



with p% \ ■ ■ ■ \pm-l be a partition of 2m indices into m — 1 blocks, each of size 3 
or more, and qi \ ■ ■ ■ \q m is a partition of 2m indices into m blocks, each of size 2. 



In particular, following the notation by Raudenbush et al. (2000), 

1 5 
ci = - -vec T [^i g> *;]t>ec[L (4) (zj)] + — vec T [^i ® (8) *;]t;ec[L (3) (zj) ® L (3) (z*)] 
o 24 

where 



L (3) (z/) = - ^2 vec(ajaj)a 



exp I a 0j + cx~j ii 



1 - exp I a j + cd 



1 + exp ( a 0j + ajz/ 



and 



L^(z/) = — 7;ec[z;ec(Q!jQj)aJ]aJ 



exp I a j + cd z/ 



1 - 4 exp faoj + cejzz) + exp (a j + "Jz/^ 



1 + exp ia j + ctjz; 



As for the adaptive-based estimators, the score equations of both the intercepts 
and factor loadings have not closed form solutions, and a quasi- Newton procedure 
has been used to solve implicit equations. 



